29 research outputs found
Iterative Embedding with Robust Correction using Feedback of Error Observed
Abstract Nonlinear dimensionality reduction techniques of today are highly sensitive to outliers. Almost all of them are spectral methods and differ from each other over their treatment of the notion of neighborhood similarities computed amongst the high-dimensional input data points. These techniques aim to preserve the notion of this similarity structure in the low-dimensional output. The presence of unwanted outliers in the data directly influences the preservation of these neighborhood similarities amongst the majority of the non-outlier data, as these points ocuring in majority need to simultaneously satisfy their neighborhood similarities they form with the outliers while also satisfying the similarity structure they form with the non-outlier data. This issue disrupts the intrinsic structure of the manifold on which the majority of the non-outlier data lies when preserved via a homeomorphism on a lowdimensional manifold. In this paper we come up with an iterative algorithm that analytically solves for a non-linear embedding with monotonic improvements after each iteration. As an application of this iterative manifold learning algorithm, we come up with a framework that decomposes the pair-wise error observed between all pairs of points and update the neighborhood similarity matrix dynamically to downplay the effect of the outliers, over the majority of the non-outlier data being embedded into a lower dimension. Preliminary work. Under review by MLIS 2015. Do not distribute
NoPeek: Information leakage reduction to share activations in distributed deep learning
For distributed machine learning with sensitive data, we demonstrate how
minimizing distance correlation between raw data and intermediary
representations reduces leakage of sensitive raw data patterns across client
communications while maintaining model accuracy. Leakage (measured using
distance correlation between input and intermediate representations) is the
risk associated with the invertibility of raw data from intermediary
representations. This can prevent client entities that hold sensitive data from
using distributed deep learning services. We demonstrate that our method is
resilient to such reconstruction attacks and is based on reduction of distance
correlation between raw data and learned representations during training and
inference with image datasets. We prevent such reconstruction of raw data while
maintaining information required to sustain good classification accuracies